Red Wine Quality Exploration

This report explore how the red wine quality could be affected by different chemical ingredients.

Univariate Plot Section

At the beginning let’s explore our data structure.

## [1] 1599   13
## 'data.frame':    1599 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
## 
##   3   4   5   6   7   8 
##  10  53 681 638 199  18

The summary statistics for each variable:

Fixed.Acidity

Description: most acids involved with wine or fixed or nonvolatile (do not evaporate readily).

## [1] "Fixed.Acidity Statistics:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90

Fixed acidity (tartaric acid) has a slight right tail but I think overall it has a normal distribution with mean of 8.32 g/dm3 and median 7.9 g/dm3:

  • The distribution is peaking around 7.2 g/dm3.
  • The minimum value for the fixed.acidity is 4.6 g/dm3.
  • When we change the binwidth, we can see clear gaps between fixed.acidity values of 14 and 16. I wonder why these gaps exist? and does it have any effect of the wine quality?

Volatile.Acidity

Description: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste.

## [1] "Volatile.Acidity Statistics Summary:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800

The Volatile.Acidity Histogram shows right skewed distribution. I believe that the volatile acidity values must be small as the more amount of acetic acid wine the more unpleasant taste we get that interpret most of values are less than 1.0 g/dm3. I wonder what will be the quality of wine that has more than 1 g/dm3 of acidic acid?

Citric.Acid

Description: found in small quantities, citric acid can add ‘freshness’ and flavor to wines

## [1] "Citirc.Acid Statistics Summary:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000
## [1] "Top 5 values for Citric Acid:"
## 
##    0 0.49 0.24 0.02 0.26 
##  132   68   51   50   38

Citric.Acid quantities histogram shows a right skewed distribution that peaking around 0.0 , 0.25 & 0.5 g/dm3 of citric acid. I thought the majority of wine observations will have citric acid as an ingredient, but it seems I was wrong. The highest percent of red wine have 0 g/dm3 of citric acid. I am thinking does that means the quality will be decreased if the wine doesn’t have citric acid as it might miss the ‘freshness’ and good flavor feeling. We will see how the wine quality will be affected by this in next sections.

Residual.Sugar

Description: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet.

The Residual.Sugar histogram shows a right skewed distribution. It also shows some red wine observations have less than 1 g/dm3 of sugar ( 2 observations has 0.9 g/dm3 ). Most of wines have a value of sugar that between 1.2 and 3.5 g/dm3.

## [1] "Wines with less than 1g/dm3 of sugar amount:"
## 
## 0.9 
##   2

Chlorides

Description: the amount of salt in the wine.

## [1] "Chlorides Statistics Summary:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100

The Chlorides Histogram show a right skewed distribution that is peaking around 0.08 g/dm3. Most of wines has a Chlorides amount between 0.03 and 0.125 g/dm3. The histogram also shows that there is a gap around 0.3, 0.425 and 0.55. Also, the max value of for chlorides is 0.611. I wonder how chlorides affects the wine quality?

Let’s subset the data and see the quality of wines with amount of chloride greater than 0.35 g/dm3. It seems that we have 0 of wines observations having quality level of 8. I think the chlorides amount might might affect the wine quality. We will try to figure that out in the next sections.

## 
##  4  5  6  7 
##  1 13  3  1

Free.Sulfur.Dioxide

Description: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine.

## [1] "Free.Sulfur.Dioxide Statistics Summary:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00

The Free.Sulfur.Dioxide Histogram shows a right skewed distribution that is peaking around 6. The majority of wines have an amount of free.sulfur.dioxide between 2.5 and 32.5 mg/dm3. I want to explore the quality of wines that have amount of free.sulfur.dioxide greater than 65 mg/dm3. Do they have high quality ranking?

Total.Sulfur.Dioxide

description: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine.

## [1] "Total.Sulfur.Dioxide Summary Statistics (g/dm3):"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00600 0.02200 0.03800 0.04647 0.06200 0.28900

Total.Sulfur.Dioxide histogram shows a right tailed distribution where most of values are less than 0.3 g/dm3. The distribution peaks around 0.028 g/dm3. The majority of wines has an amount of total sulfur dioxide between 0.01 & 0.085 g/dm3.

Density

Description: the density of water is close to that of water depending on the percent alcohol and sugar content

## [1] "Density Summary Statistics (g/cm3)"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0040

Density Histogram shows a normal distribution with mean of 0.997 g/cm3. There are some gaps exist on both tight & left tails. I wonder how the density affect the wine quality? I believe the closer it gets to the water density the higher quality it has.

PH

Description: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

## [1] "PH Summary Statistics:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010

PH histogram show a normal distribution with mean of 3.3 on pH scale. There is less than 1% of wines that are close to be very acidic and none of the is very basic on pH scale.

Sulphates

Description: a wine additive which can contribute to sulfur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant.

## [1] "Sulphates Statistics Summary (g/dm3):"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000

The Sulphates has a right tailed distribution that is peaking around 0.6. It also shows many gaps after the value of 1.2. Less than 0.4% of red wine observations has amount of sulphates greater than 1.65 g/dm3.

Alcohol

Description: the percent alcohol content of the wine.

## [1] "Alcohol (% by volume) Summary Statistics:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

Alcohol histogram shows a positive skewed distribution with mode of 9.5 and mean around 10.4 % by volume. The majority of have between 9 and 13 % of alcohol. I think wine with qood quality will have a high percent of Alcohol. Let’s see if I am right or not in the next sections.

Quality

Description: the wine quality (score between 0 and 10)

## [1] "Quntity Staistics Summary:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.636   6.000   8.000

The Quality Percentile Chart shows that around 80% of red wines have quality level of 5 or 6. Around 1.25% of red wines have quality level greater than 7 which is very small percent. Now, I am thinking about the chemical components and their amounts that was included in those 1.25% of highest quality red wines. What are the chemical prosperities that affect the wine quality?

Univariate Analysis

The data has 1599 red wine observations with 12 features ( as described in details in previous section)

What is/are the main feature(s) of interest in your dataset?

After googling and exploring the data I think the main features that might influence the quality of red wines are citric acid amount, and alcohol percent.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Fixed Acidity, volatile acidity, chlorides and residual sugar would be helpful in defining the red wine quality as they are affecting the taste characteristics of wine which are very important.

Bivariate Plots Section

Before I start plotting the data. Let’s the find the correlation between the different variables just to be sure of the main features of interest:

## 
##   3   4   5   6   7   8 
##  10  53 681 638 199  18

The above matrix shows the following:

Now, let’s test and know more about the above associations.

Alcohol % by volume and wine quality

## 
##   3   4   5   6   7   8 
##  10  53 681 638 199  18
<<<<<<< HEAD <<<<<<< HEAD

=======

>>>>>>> red-wine-quality-analysis =======

>>>>>>> red-wine-quality-analysis

I think we can say there is a slight linear relationship between quality & alcohol percent. The data shapes horizontal stripes. As alcohol % increase the quality increase.

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$quality and redwine$alcohol
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4373540 0.5132081
## sample estimates:
##       cor 
## 0.4761663

I’d like to see the mean alcohol % per each quality

It seems that the highest quality wine has the highest alcohol % by volume. The chart shows some outliers that violate this assumption such as wines with quality level of 5. The blue line shows how the mean alcohol% forms a linear relation with the quality level, so I think our assumption still true.

Statistics Summary:

## # A tibble: 6 x 4
##   quality mean_alcohol_percent median_alcohol_percent total_count
##     <dbl>                <dbl>                  <dbl>       <int>
## 1       3             9.955000                  9.925          10
## 2       4            10.265094                 10.000          53
## 3       5             9.899706                  9.700         681
## 4       6            10.629519                 10.500         638
## 5       7            11.465913                 11.500         199
## 6       8            12.094444                 12.150          18

Citric acid vs. Wine Quality

The correlation coefficient value is less than 0.3 so I think there is no strong association between citric acid amount & wine.

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$quality and redwine$fixed.acidity
## t = 4.996, df = 1597, p-value = 6.496e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.07548957 0.17202667
## sample estimates:
##       cor 
## 0.1240516

After using jitter..

<<<<<<< HEAD <<<<<<< HEAD

=======

>>>>>>> red-wine-quality-analysis =======

>>>>>>> red-wine-quality-analysis

From above chart, the mean of citric acid amount (g/dm3) is increasing as we move from the lowest wine quality level to the highest quality level. Also, we should mention that the differences are not that big but it still shows a slight linear relationship.

Citric acid amount and wine quality statistics summary:

## Quality Level: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0050  0.0350  0.1710  0.3275  0.6600 
## [1] "-------------------------------------------------------------------"
## Quality Level: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0300  0.0900  0.1742  0.2700  1.0000 
## [1] "-------------------------------------------------------------------"
## Quality Level: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0900  0.2300  0.2437  0.3600  0.7900 
## [1] "-------------------------------------------------------------------"
## Quality Level: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0900  0.2600  0.2738  0.4300  0.7800 
## [1] "-------------------------------------------------------------------"
## Quality Level: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.3050  0.4000  0.3752  0.4900  0.7600 
## [1] "-------------------------------------------------------------------"
## Quality Level: 8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0300  0.3025  0.4200  0.3911  0.5300  0.7200 
## [1] "-------------------------------------------------------------------"
## NULL

The majority of zero citric acid wines belong to the average quality of red wines. To see this, I will subset the wines with zero citric acid to see their distribution.

##   Quality Freq
## 1       3    3
## 2       4   10
## 3       5   57
## 4       6   54
## 5       7    8
##   Quality Freq Rel.Freq
## 1       3    3     2.27
## 2       4   10     7.58
## 3       5   57    43.18
## 4       6   54    40.91
## 5       7    8     6.06

Volatile Acidity vs. Quality

The below graph shows a moderate negative association between wine quality & volatile acidity amount.

<<<<<<< HEAD <<<<<<< HEAD

=======

>>>>>>> red-wine-quality-analysis =======

>>>>>>> red-wine-quality-analysis

The least quality wines -that belong to level 3- have the maximum value of the volatile acidity amount, but the highest quality wines -that belong to levels 7 and 8-have the minimun value of volatile acidity amount. That confirm what was mentioned, the increase in the amount of volatile acidity might lead to unpleasant taste.

Volatile Acidity Amount and wine quality statistic summary:

## Quality Level: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4400  0.6475  0.8450  0.8845  1.0100  1.5800 
## [1] "-------------------------------------------------------------------"
## Quality Level: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.230   0.530   0.670   0.694   0.870   1.130 
## [1] "-------------------------------------------------------------------"
## Quality Level: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.180   0.460   0.580   0.577   0.670   1.330 
## [1] "-------------------------------------------------------------------"
## Quality Level: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1600  0.3800  0.4900  0.4975  0.6000  1.0400 
## [1] "-------------------------------------------------------------------"
## Quality Level: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3000  0.3700  0.4039  0.4850  0.9150 
## [1] "-------------------------------------------------------------------"
## Quality Level: 8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2600  0.3350  0.3700  0.4233  0.4725  0.8500 
## [1] "-------------------------------------------------------------------"
## NULL
## 
## Call:
## lm(formula = quality ~ volatile.acidity, data = subset(redwine, 
##     volatile.acidity <= quantile(redwine$volatile.acidity, 0.999)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.78977 -0.54547 -0.01325  0.47198  2.92568 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       6.55757    0.05841  112.27   <2e-16 ***
## volatile.acidity -1.74500    0.10503  -16.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7436 on 1596 degrees of freedom
## Multiple R-squared:  0.1474, Adjusted R-squared:  0.1469 
## F-statistic:   276 on 1 and 1596 DF,  p-value: < 2.2e-16

Building a linear model using the volatile.acidity as a predictor –> R-squared has a small value, so I think knowing the volatile acidity amount alone might not be adequate to predict the wine quality.

Density and wine quality

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$quality and redwine$density
## t = -7.0997, df = 1597, p-value = 1.875e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2220365 -0.1269870
## sample estimates:
##        cor 
## -0.1749192
<<<<<<< HEAD <<<<<<< HEAD

=======

>>>>>>> red-wine-quality-analysis =======

>>>>>>> red-wine-quality-analysis

There is no a strong association between wine density & its quality.

Density and wine quality statistics summary:

## Quality Level: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9947  0.9962  0.9976  0.9975  0.9988  1.0010 
## [1] "-------------------------------------------------------------------"
## Quality Level: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9934  0.9956  0.9965  0.9965  0.9974  1.0010 
## [1] "-------------------------------------------------------------------"
## Quality Level: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9926  0.9962  0.9970  0.9971  0.9979  1.0030 
## [1] "-------------------------------------------------------------------"
## Quality Level: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9954  0.9966  0.9966  0.9979  1.0040 
## [1] "-------------------------------------------------------------------"
## Quality Level: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9906  0.9948  0.9958  0.9961  0.9974  1.0030 
## [1] "-------------------------------------------------------------------"
## Quality Level: 8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9908  0.9942  0.9949  0.9952  0.9972  0.9988 
## [1] "-------------------------------------------------------------------"
## NULL
Density vs. Residual Sugar

The below graph goes against my intuition. It shows that there is no association between both variables. Most of wines have between 1.5 to 3 g/dm3
of residual sugar. I can’t say that the variance in our samples’ wine density is due to the change in sugar amount.

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$residual.sugar and redwine$density
## t = 15.189, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3116908 0.3973835
## sample estimates:
##       cor 
## 0.3552834
<<<<<<< HEAD <<<<<<< HEAD

=======

>>>>>>> red-wine-quality-analysis =======

>>>>>>> red-wine-quality-analysis
Fixed Acidity vs. Citric Acid

There is a moderate association between citric acid amount and the fixed acidity. The below graph show that the fixed acidity amount is increasing with the increase of citric acid amount.

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$fixed.acidity and redwine$citric.acid
## t = 36.234, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6438839 0.6977493
## sample estimates:
##       cor 
## 0.6717034
<<<<<<< HEAD <<<<<<< HEAD

=======

>>>>>>> red-wine-quality-analysis =======

>>>>>>> red-wine-quality-analysis

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

  • The wine quality correlates strongly with alcohol percent where the wine highest quality has the highest highest mean value for the alcohol % by volume.
  • The wine quality also has a positive association with the citric acid amount. Most of wines has a small amount of citric acid that doesn’t exceed 0.8 g/dm3. The Majority of wines with highest quality level (7 or 8 ) have a citric acid amount between 0.25 and 0.5 g/dm3. High percent of wines with quality level of 3 has a 0 g/dm3 of citric acid.
  • Volatile acidity amount has a negative correlation with the wine quality. Most of red wines with quality level of 8 have an volatile acidity amount between 0.3 and 0.5 g/dm3, while, wines with quality level of 3 used to have a volatile acidity amount greater than 0.4 g.dm3.
  • In our data samples, there is no strong linear relationship between density and wine quality. But the graphs above showed that there is a slight association where the highest quality level of wine had the least mean of density and the lowest quality level of wine had the highest mean value of density.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Yes, I observed that: there is a positive correlation between citric acid amount and fixed acidity amount.Density also strongly correlates with alcohol percent by volume included.

What was the strongest relationship you found?

  • The wine quality is positively correlated with alcohol percent included.
  • The wine quality is negatively and strongly correlate with volatile acidity amount included.
  • Citric acid amount also correlate (positive association) to red wine quality, but less strongly than volatile acidity amount and citric acid.

Multivariate Plots Section

There is a negative association between alcohol% by volume and density (g/cm3). The average quality of wines have a density between 0.995 and 1 g/cm3 and alcohol less than 11% by the wine volume. As alcohol percent increase the wine quality increase and density decrease.

Majority of wines with average quality (levels 5 & 6 ) tend to have small citric amount less than 0.5 (g/dm3). Wine Quality of level 8 shows a clear linear relationship between citric acid amount and density.

The above density chart confirms the negative correlation between volatile acidity amount and the quality. Wines with highest quality tend to occur more often where lower values of volatile acidity were included, while wines with lowest quality tend to include higher amount of volatile acidity.

The graph above shows that, highest quality wines tend to happen where volatile acidity amount less than 0.6 (g/dm3) and citric acid amount between 0.25 and 0.6 (g/dm3), while lowest quality wines tend to have less amount of citric acid (less than 0.125 g/dm3) and higher amount of volatile acidity. Average wine quality tend to happen where volatile acidity amount is between 0.2 and 0.8 (g/dm3) and citric acid amount less than or equal to 0.5 g/dm3. The yellow line shows the negative association between volatile acidity amount & citric acidity amount.

Final Plots and Summary

Based on the previous analysis, we can conclude that the red wine quality is strongly affected by the following variables:

<<<<<<< HEAD <<<<<<< HEAD

=======

>>>>>>> red-wine-quality-analysis =======

>>>>>>> red-wine-quality-analysis

Majority of wines with the highest quality tend to have lower amount of volatile acidity while majority of lowest quality wines tend to have higher values of volatile acidity amount.

<<<<<<< HEAD <<<<<<< HEAD

The graph above shows how the quality is affected by the alcohol % included. There is a clear positive relation that is represented with the black line (mean value for alcohol per each quality level).

=======

The graph above shows how the quality is affected by the alcohol % included. There is a clear positive relation that is represented with the black line (mean value for alcohol per each quality level).

>>>>>>> red-wine-quality-analysis =======

The graph above shows how the quality is affected by the alcohol % included. There is a clear positive relation that is represented with the black line (mean value for alcohol per each quality level).

>>>>>>> red-wine-quality-analysis

Volatile acidity amount & Alcohol % by volume are good predictors that can be used in predicting red wine quality. This graph above shows how each quality level of red wine is defined according to those two variables. As we mentioned before, red wine with highest quality tend to have small amount of volatile acidity and high amount of alcohol compared to the average value.

Reflection

Where did I run into difficulties in the analysis?

There were no strong correlation between dependent and independent variables. There were outliers in each quality level which might have affected the analysis. Our data sample doesn’t include data for all quality levels (1 to 10), so we don’t know if our findings will apply to wines with quality level of 10 or 1.

Where did I find successes?

By visualizing bivariate & multivariate plots, we had the chance to see how wine quality is affected by the amount chemical components included. Our final findings showed that wine quality is clearly affected by volatile acidity amount, citric acid amount and alcohol percent by volume included. The analysis also confirmed our intuitions about the increase of volatile acidity amount might lead to decrease the wine quality because of undesired salty taste it might has. Highest quality wines tend to have higher alcohol percent and lower citric acid amount.

How could the analysis be enriched in future work (e.g. additional data and analyses)?

I think the analyses can be enriched if we have more data that belong to the missing quality levels. Also, if we have more categorical variables that describe the wine quality. I think we can convert some numeric variables to categorical variables by mapping their values into tiers, for example, pH variable: any wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale. we can map those values into 3 or 4 categories such as (pH value between 0 and 3 –> very acidic, >3 and <= 4 –> moderate acidic, >4 and <= 8 –> less acidic , and >8 –> tend to be very basic). Also, the analysis can be enriched by checking the relationships between all independent variables as we might find some interested combination that affect the wine quality.